On this blog I will use the Leaflet, “an open source JavaScript library used to build [interactive] mapping applications.” To plot the 1,000,000 geocodes from the New York Taxi commission... A million data explore with interactive leaflet map

Note: To understand some of the technical terms, it would help if the reader has some understanding of cartography and intermediary R programming knowledge.

“The purpose of visualization is insight, not pictures.” - Ben Shneiderman

Motivation

On my last blog plotting large data with ggplot2 I wanted to test visualization with static spaital mapping in R. About a million geocode data from New York City Taxi and Limousine commission were used. The data was collected from taxi cabs GPS on customer pickup and drop off locations. The commission also makes a shape file available that contains taxi boundaries for all of 261 boroughs in New York City.

The test, that I dubbed ‘stress’ test, was partially successful, in that I was able to plot about 439,000 data points in the viewable panel with ggmap, the rest of the plots fall outside of the vieable panel. However, ggmap proved that it is capable of plotting all one million gecodes if we zoom out and fit all the plots. But the static plot with massive data gives you undecipherable overlaps making the visualization unusable for analysis.

On this blog I will use the Leaflet, “an open source JavaScript library to build [interactive] mapping applications.” And plot the 1,000,000 geocodes from the New York City Yellow Taxi cab’s from the month of January 2016.

With out further ado, lets get right to it.

Data Prepration

I will not go over the details of how the data is prepared for plotting, since the data prepration was discussed on the previous blog. However, a new variable was added for the interactive plot that pupulates the popup, thus the following script starts from there.

Load the libraries and data required to plot the interactive map.

As always, we start by loading the required packages, the data that contains the long/lat for the drop off locations and a geojson shape file (“an open standard format designed for representing simple geographical features, along with their non-spatial attributes, based on JavaScript Object Notation.”) provided by the NYC taxi commision.

#load library
library("leaflet")          # Create a Leaflet map widget
library("geojsonio")        # Convert various data formats to/from GeoJSON or TopoJSON.
library("dplyr")            # Data cleaning
library("mapview")          # View spatial objects interactively

setwd("~/Documents/Data-Science/Blog/Blog8")
#load the data
df_ride_total <- read.csv("./data/df_ride_total25k.csv")
ny_taxi_zone_geojson <- readLines("./data/taxi_zones.geojson") %>% paste(collapse = "\n")

# map view  for the 
dat1 <- geojson_read("./data/taxi_zones.geojson", what = "sp")
#mapview(dat1)  
#head(dat1)

Add a feature for the geocode popup

Once ploted, each drop off data point will can be uniquely identfied with its long/lat information. To display longitude and latitude as a popup, we add a feature to the dataset with the following code.

df_ride_total <- df_ride_total %>% mutate( popupInfo1 = paste(
                                            "lat",   round(dropoff_latitude,2), ",",
                                            "long",  round(dropoff_longitude,2)
                                            )
                                  )

Visualizing a million data points on interactive map

Finally, we are ready to plot and interactively examine the one million data points, and see if R graphics can handle stress test. To make the map even more intuitive, the geocoded data for the taxi’s drop off locations are layrred on to of the geojson shape file. The shape file that came in a geojson format was loaded into R with the geojson_read function from mapview library.

Figure 1: A gif file created from when testing the various eatures. Try the same actions figure 2. ###Figure1:

When rendering the plot with ‘knitr’, it takes long time to render 1,000,000 points, if a computer doesn’t have enough memory. I hvae enough, and the gif above is with one million data loaded. However, just to make it easy for those who do not, the following interactive map is for 25,000 goecode points. Go ahead and explore by zooming and clicking on various parts of the map. Very nifty!

Figure 2: An Interactive Leaflet based New Yourk Ctiy Yellow Taxi drop off map for Jan 2016. ###Figure2:

# Keep only the taxi zone for the popup 
pp_leaflet_spatial_1 <- leaflet(df_ride_total) %>% 
                        addTiles(group = "OpenStreetMap.BlackAndWhite (default)")  %>%
                        addProviderTiles("Hydda.Full", group = "Full")  %>%
                        addProviderTiles("Stamen.Toner", group = "Toner")  %>%
                        addProviderTiles("Esri.WorldStreetMap", group = "WorldStreetMap")   %>%
                        setView(lng = -73.97125, lat = 40.78306, zoom = 11) %>%         # geocode("manhattan, NY")
                        addPolygons(data = dat1, popup = popupTable(dat1), color = "green", group = "Outline") %>%          
                        addCircleMarkers( ~dropoff_longitude, 
                                        ~dropoff_latitude, 
                                        group = "Markers",
                                        radius = 5,
                                        color = "red", 
                                        fill = TRUE, 
                                        opacity = 0.8,
                                        popup= ~popupInfo1,
                                        options = popupOptions(closeButton = TRUE),
                                        clusterOptions = markerClusterOptions() 
                                        #icon = icon goes here.
                                        ) %>% addLayersControl(
                                                baseGroups = c("OpenStreetMap.BlackAndWhite (default)",
                                                               "Full", 
                                                               "Toner",
                                                               "WorldStreetMap"
                                                               ), 
                                                                              overlayGroups = c("Markers", "Outline"),
                                                                              position = "topleft"
                                                               )
pp_leaflet_spatial_1


Notice there is a layer control feature. It is located on the top right. When slected it provides a choice of 4 tiles from different map providers. In addition to the openstreet default tile, we have ESRI WorldStreetMap, Stamen Toner and Hydda Full tiles that can be selected. The shape file and the geocode points can also be selected/deselected for different views. These capabilities makes interactive maps more intuitive and allow for a better examination of large amount of data, at different parts of the city.

In addition to seeing a popup that tells the geo location for each dot on a click of a mouse, the taxi zone information can also be clicked to get a popup with the taxi zone numbers and area size. This is possible because of the feature in the Mapview package extracting the header from the geojson header property.

Take Away

As demonstrated, the Leaflet java script library for R, a programming language for statistical computing and graphics, is capable of plotting 1,000,000 data points on small screen. Provided one has large enough memory (RAM) on his/hers computer. The layering of the shape file, the map tiles, and the geocode together with the zooming capability of Leaflet gives a much easier exploration experience.

The advantage of interactive mapping with Leaflet, for large datasets, its ability to cluster, zoom, click on data points interactively.

Contact

If you need consultation on this kind of work, feel free to contact ability.giday@gmail.com.

References

This a fully reproducible markdown document generated using RStudio IDE.